Conversation
Host scanning now uses globs to only get inodes for the specific files
matching the globs.
Prefix map is populated with the longest prefix for each glob
e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_
Kernel captures events based on inode first and then prefix match (this
behavior is unchanged) and then userspace does a glob match on the path
and host_path.
Molter73
left a comment
There was a problem hiding this comment.
Awesome! Thanks for tackling this!!
fact/src/bpf/mod.rs
Outdated
| let mut new_paths = Vec::with_capacity(paths_config.len()); | ||
| let mut builder = GlobSetBuilder::new(); | ||
| for p in paths_config.iter() { | ||
| builder.add(Glob::new(&p.to_string_lossy())?); |
There was a problem hiding this comment.
We probably want to hard fail if a configured path is wrong, if we change the string at this point we might not match the strings configured by a user and we will not report things in there.
Molter73
left a comment
There was a problem hiding this comment.
Changes look good, can we add at least one integration test with globs? Just to make sure it is working, we can always expand later.
|
/retest |
| inode_key_t inode_key = inode_to_key(file->f_inode); | ||
| const inode_value_t* inode = inode_get(&inode_key); | ||
| inode_key_t* inode_to_submit = &inode_key; | ||
| switch (inode_is_monitored(inode)) { | ||
| case NOT_MONITORED: | ||
| if (!is_monitored(path)) { | ||
| goto ignored; | ||
| } | ||
| // Matched by path prefix only, not by inode. | ||
| // Set inode to NULL so userspace knows to do glob matching. | ||
| inode_to_submit = NULL; | ||
| break; | ||
| case MONITORED: | ||
| break; | ||
| } | ||
|
|
||
| submit_event(&m->file_open, event_type, path->path, &inode_key, true); | ||
| submit_event(&m->file_open, event_type, path->path, inode_to_submit, true); |
There was a problem hiding this comment.
I didn't want to use a map because the inode_key_t type is small, but we might want to have a per cpu array map and use a pointer to that to simplify the code even further. It would also reduce the amount of stack used for every program, which might be helpful in the future.
There was a problem hiding this comment.
Anyways, I don't want to block the PR on this, we can do it in a follow up.
fact/src/bpf/mod.rs
Outdated
| let mut builder = GlobSetBuilder::new(); | ||
| for p in paths_config.iter() { | ||
| builder.add( | ||
| Glob::new(&p.to_string_lossy()) |
There was a problem hiding this comment.
I thought I had mentioned this, we probably don't want to use to_string_lossy() here, this will make it so the path will not actually match if an invalid UTF-8 character is used and will silently drop events that we should emit. This will be very hard to debug, special for a user that might not be aware of this, instead we probably want to hard fail, falling back to the previous configuration or stopping altogether.
| cwd = os.getcwd() | ||
| config = { | ||
| 'paths': [monitored_dir, '/mounted', '/container-dir'], | ||
| 'paths': [f'{monitored_dir}/**/*', '/mounted/**/*', '/container-dir/**/*'], |
There was a problem hiding this comment.
It seems a bit annoying that we now have to manually specify /**/* at the end of every directory being configured. I might look into changing this behavior in the future, maybe we can add this automatically in fact if the configured path has no glob expressions in it.
There was a problem hiding this comment.
I'm also curious why this matches files directly under /mounted for instance, when the glob expressions explicitly says there should be 2 / after the path.
There was a problem hiding this comment.
why this matches files directly under /mounted
** will match zero or more path segments, so matching files in /mounted is expected behaviour
| config, config_file = fact_config | ||
| config['paths'] = [ | ||
| f'{monitored_dir}/**/*.txt', | ||
| f'{monitored_dir}/**/test-*.log', |
There was a problem hiding this comment.
Can we add a case that is something like: f'{monitored_dir}/*.cfg
This should validate we don't recursively check directories when ** is not used.
|
|
||
| ```shell | ||
| cargo test --config 'target."cfg(all())".runner="sudo -E" --features=bpf-test | ||
| cargo test --config 'target."cfg(all())".runner="sudo -E"' --features=bpf-test |
tests/test_wildcard.py
Outdated
| txt_file = os.path.join(monitored_dir, 'document.txt') | ||
| with open(txt_file, 'w') as f: | ||
| f.write('This should be captured') | ||
|
|
||
| # Should not match any pattern | ||
| log_file = os.path.join(monitored_dir, 'app.log') | ||
| with open(log_file, 'w') as f: | ||
| f.write('This should be ignored') |
There was a problem hiding this comment.
You probably want these the other way around, reason being the server.wait_events method does not wait for events after it caught the last one expected, so you want to trigger anything that should be ignored before things that should be caught.
There was a problem hiding this comment.
And yes, I know this should be improved, probably by checking there are no events leftover in the server buffer during teardown or something.
tests/test_wildcard.py
Outdated
| return config, config_file | ||
|
|
||
|
|
||
| def test_extension_wildcard(fact, wildcard_config, monitored_dir, server): |
There was a problem hiding this comment.
The fact fixture is configured as autouse, so you shouldn't need to explicitly ask for it in the tests.
There was a problem hiding this comment.
Comments in this file apply to all tests in here.
Description
Host scanning now uses globs to only get inodes for the specific files matching the globs.
Prefix map is populated with the longest prefix for each glob e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_
Kernel captures events based on inode first and then prefix match (this behavior is unchanged) and then userspace does a glob match on the path and host_path.
Somewhat relies on a chain of PRs in the main repo (in merge order):
stackrox/stackrox#19057
stackrox/stackrox#19063
stackrox/stackrox#19089
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
TODO(replace-me)
Use this space to explain how you tested your PR, or, if you didn't test it, why you did not do so. (Valid reasons include "CI is sufficient" or "No testable changes")
In addition to reviewing your code, reviewers must also review your testing instructions, and make sure they are sufficient.
For more details, ref the Confluence page about this section.